Cis-regulatory atlas of primary human CD4+ T cells

Cis-regulatory elements (CRE) are critical for coordinating gene expression programs that dictate cell-specific differentiation and homeostasis. Recently developed self-transcribing active regulatory region sequencing (STARR-Seq) has allowed for genome-wide annotation of functional CREs. Despite this, STARR-Seq assays are only employed in cell lines, in part, due to difficulties in delivering reporter constructs. Herein, we implemented and validated a STARR-Seq–based screen in human CD4+ T cells using a non-integrating lentiviral transduction system. Lenti-STARR-Seq is the first example of a genome-wide assay of CRE function in human primary cells, identifying thousands of functional enhancers and negative regulatory elements (NREs) in human CD4+ T cells. We find an unexpected difference in nucleosome organization between enhancers and NRE: enhancers are located between nucleosomes, whereas NRE are occupied by nucleosomes in their endogenous locations. We also describe chromatin modification, eRNA production, and transcription factor binding at both enhancers and NREs. Our findings support the idea of silencer repurposing as enhancers in alternate cell types. Collectively, these data suggest that Lenti-STARR-Seq is a successful approach for CRE screening in primary human cell types, and provides an atlas of functional CREs in human CD4+ T cells. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-023-09288-3.


Background
Cis-regulatory elements (CRE) are DNA elements that manage or modulate gene transcription. Putative CREs are frequently identified in structural assays, such as those measuring chromatin accessibility (e.g., DNase sequencing [DNase-Seq] and assay for transposase-accessible chromatin sequencing [ATAC-Seq]), but these assays do not provide information about CRE function. CREs contribute to defining cell-specific gene expression programs through positive and negative regulation, and by insulating genes from inappropriate regulation. Annotation of CRE function is an ongoing challenge using existing datasets; the chromatin and genomic landscape of CREs is highly varied [1,2]. Multiple attempts to identify CREs from existing data rely on correlating epigenetic chromatin opening and histone enrichment with function [3][4][5][6]. Structure-based approaches fail to explain gene expression changes: in our recent study of T cell activation, genes with nearby chromatin opening were found to be both up-and down-regulated [7]. Recently, the ENCODE Project has released an encyclopedia of CREs (SCREEN), which leverages available chromatin modifications, chromatin immunoprecipitation sequencing (ChIP-Seq) binding experiments, and expression quantitative trait loci (eQTLs) to annotate putative regulatory elements [8,9]. Other attempts at annotating the regulatory genome were attempted with genome-wide Cas9 editing [10], and more narrowly with targeted Massively Parallel Reporter Assays [11]. Measuring CRE activity using self-transcribing active regulatory region sequencing (STARR-Seq) has proved to be adaptable and sensitive, making it a powerful assay in the functional genomic toolkit [12]. However, CRE imputation is limited in functional predictive power, as screening approaches, including STARR-Seq, largely overlook negative regulatory elements (NREs), such as silencers, attenuators, and insulators [13].
Enhancer functional screening has been the subject of numerous genome-wide investigations, including in some immune cell populations [14][15][16][17]. NREs, however, are historically the subjects of individual experimentation. Notably, NREs are uniquely important during the development of some immune cell populations, in which they suppress CD4 expression during thymic development [18]. Only recently have silencers been identified through novel genome-wide screening methods [11,14,19]. Such screens are limited by the use of a synthetic library, or the ability to identify only NREs and not enhancers. One recent report suggests that ATAC-STARR is able to detect both activating and repressing fragments, but this study was performed in a cell line and does not include functional validation [20]. There remains a need to functionally assess the regulatory potential of CREs in cell types of interest, particularly in mature, primary, non-transformed, human cells. CD4+ T cells are adaptive immune cells with important roles in human health and disease. It is known that CD4+ T cells are transcriptionally plastic, responding to cytokine stimuli with diverse and distinct transcriptional programs [21]. These large transcriptional changes are accompanied by precise opening and closing of chromatin; however, the functional status of these putative CREs remains unknown [7]. A limited number of studies have attempted to define CREs in CD4+ T cells, including using a synthetically constructed massively parallel reporter assay (MPRA) to identify small pathogenic single-nucleotide polymorphisms (SNPs) [22] and using pseudo genome-wide assays like CapSTARR-Seq (DNA hypersensitive site enrichment with capture) in P5424 murine thymocytes [23]. Others have proposed a CD4+ subtype specific enhancer atlas based on histone 3, lysine 27 acetylation (H3K27ac) and histone 3, lysine 4 monomethylation (H3K4me1) distribution [24]. Thus, there is an active need to comprehensively profile the function of the non-coding genome of relevant human cells, in particular human CD4+ T cells.
We implemented a STARR-Seq-based screen in human resting total CD4+ T cells using a non-integrating lentiviral transduction system. Our screen identifies nearly 8000 functional enhancers and 6000 functional NREs from a library of open chromatin. This assay is the first example of a genome-wide assay examining the accessible chromatin elements in human primary cells. Herein, we demonstrate that a modified STARR-Seq screening method can identify both enhancer and NREs with high specificity as validated by luciferase assays. We find that the STARR-Seq enhancers and NREs are marked with distinct profiles of histone modifications, including both canonically activating and repressive histone marks. Interestingly, enhancers and NREs display characteristic nucleosome positioning in their endogenous locations, and provide regulation of target genes via chromatin looping to target promoters. We provide supporting evidence that NREs may function as enhancers in other cell types, whereas functional enhancers are largely specific to hematopoietic lineages. We also provide a catalogue of transcription factors that may regulate enhancers and NREs in CD4+ T cells, providing hitherto known and unknown factors for future study.

Lenti-STARR-Seq in Primary Human CD4+ T Cells
We adapted a screening vector from the promoter-less STARR-Seq vector, which uses a bacterial origin of replication (ORI) as a cryptic promoter to initiate transcription ( Fig. 1A) [25]. This STARR-Seq screening vector was further cloned into a lentiviral backbone for viral packaging [pLenti-STARR]. An open chromatin library was prepared from a single donor's resting total human CD4+ T cells in 16X reactions with 100,000 cells each to ensure sufficient library diversity. The OMNI-ATAC protocol was employed to reduce the number of mitochondrial reads without the need for negative selection [26]. ATAC-Seq library fragments were gel purified (150-500 bp including adapters), cloned into linearized pLenti-STARR using NEBuilder HiFi assembly, and amplified in bacteria (Fig. 1B). We elected to use lentiviral particles with an HIV-1 envelope to allow for transduction of resting CD4+ cells [27,28]. A lentivirus packaging vector with mutated integrase, psPAX-D64V, was used to ensure that the STARR-Seq plasmid stays episomal, thus avoiding positional regulatory effects [28]. The Lenti-STARR-Seq library was transduced into human CD4+ T cells isolated from four separate adult, healthy donors. At least 50 million CD4+ cells were transduced per donor to ensure high fragment diversity in each biological replicate. Twenty-four hours after transduction, total RNA was harvested, reverse transcribed with a STARR transcript specific primer, PCR amplified, and next-generation sequenced. An input control library was amplified from the pLenti-STARR-Seq library.
Enhancers were called using MACS2, with the input plasmid library used as a control. Only enhancers overlapping peaks called from the input control were considered for downstream analysis. Although the weaker ATAC sites may function as enhancers within a STARR-Seq assay, we reason that the lower accessibility makes them less likely to be functional in their native genomic context. NREs, likely including silencers and insulator sequences, were detected using MACS2 with the plasmid as treatment and STARR-seq RNA as control. All genome-wide analysis was conducted with only Lenti-STARR Approach. A Putative regulatory sequences enriched by ATAC-Seq are used as input into Lenti-STARR-Seq. ATAC-Seq was performed on total human CD4+ T cells from peripheral blood obtained from healthy adult donors. The ATAC-Seq library is cloned into a promoter-less STARR-Seq screening vector that uses a bacterial origin of replication as a cryptic promoter [25]. The screening library is packaged in an HIV-enveloped non-integrating lentivirus, and transduced into resting CD4+ T cells. After 24 h in culture, STARR-Seq RNA is collected, reverse-transcribed using a transcript-specific primer, PCR-amplified, and prepared for next-generation sequencing (NGS). Created with BioRender. com B The Lenti-STARR-Seq screening vector carries an ATAC insertion site downstream of chimeric intron. C Enhancers are called with MACS2 [-log10(P) > 75], comparing STARR-Seq-obtained RNA reads to input plasmid reads [29]. Negative regulatory elements (NREs) are called using MACS2 [-log10(P) > 30] comparing input plasmid to STARR-Seq-obtained RNA reads [29]. Mean tag density is displayed, compared to 'All' accessible peaks from input. Reads are centered at CRE peak center. D Pearson correlation of tags across input ATAC-Seq peaks, Enhancers, or NREs for each STARR-Seq human CD4+ donor (N = 4) highly significant enhancers [-log10(P) > 75] and NREs [-log10(P) > 30] which intersected strongly accessible chromatin from peaks which overlap MACS peaks called from the input plasmid library alone.
STARR-Seq functional enhancers demonstrate strong central STARR RNA production, whereas NREs are depleted in the STARR-Seq library relative to all peaks from input plasmid control (Fig. 1C). Although the Lenti-STARR screening library is shared between donors, STARR RNA regulation appears consistent across the hCD4+ donors (Fig. 1D), as demonstrated by high Pearson correlation coefficients of read counts across peaks.

Enhancer and NRE element validation
We next sought to validate CRE function in traditional luciferase assays using randomly selected CRE across a variety of MACS2 significance and input accessibility levels ( Fig. 2A). Functional NREs in STARR-Seq were cloned upstream of a strong promoter (pCMV-ENH-LUC) and transfected into resting hCD4+ T cells. Luciferase activity was calculated relative to a Renilla luciferase (pRL-TK) transfection control. [Mann-Whitney test, N = 3 human CD4+ donors]; We find that the majority of putative NREs were functional in traditional luciferase assays, across varying significance and activity (fold change) levels, and across a wide array of endogenous genomic locations (intron, exon, near promoter, intergenic).
STARR-Seq-identified enhancers were cloned upstream of the minimal promoter (pGL4.26) luciferaseexpressing vector. Enhancers demonstrated strong luciferase activity, which was sometimes donor dependent (Fig. 2B). Similar to NREs, functional enhancers identified by STARR-Seq came from across various genomic locations (intron, intergenic, exon, near promoter). Some fragments displayed in Fig. 2 were tested in luciferase assays despite not reaching genome-wide significance.
We next estimated the true proportion of enhancers and NREs that are functional within the original screening library. The true proportion of CRE activity in the screen was estimated from the success rate of luciferase validation using binomial exact tests (Fig. 2C). It has been suggested that enhancers comprise between ~ 10% [30] and 35% [31] of accessible chromatin sites. We found that 27% of ATAC-Seq peaks in CD4+ T cells exhibited statistically significant enhancer activity by STARR. The estimated true proportion of these enhancers that are also luciferase active with a maximum-likelihood estimated at nearly 80%. STARR functional NREs comprise nearly 21% of the accessible genome in CD4+ T cells and also yielded an estimated true proportion of functional NREs of nearly 75% by luciferase validation. Although it is unknown what fraction of the accessible chromatin landscape NREs comprise, we observed overall that STARR-Seq performed similarly for enhancers and NREs, but may marginally outperform in detecting functional enhancers over functional NREs.

STARR-Seq CREs regulate transcription within their endogenous location
As expected, STARR-Seq enhancers and NREs display a distinct relationship with RNA transcriptional initiation and RNA polymerase. Assays, including Precision Run-On Nuclear Sequencing (PRO-Seq), capture nascent RNA production (bidirectional non-polyadenylated RNA) characteristic of active enhancer elements [32][33][34]. Shown in Fig. 3A, we find functional enhancers in STARR-Seq displayed strong central enrichment of nascent RNA production in PRO-Seq, whereas NREs were centrally depleted of such transcripts. Intriguingly, there was enrichment of the PRO-Seq signal within 100-200 bp around NREs, suggesting that enhancers may be located in surrounding regions. This suggests that although NREs are located in genomic areas generally permissive to eRNA transcription, transcriptional initiation is centrally repressed. Although the NREs may include silencers and insulators, the clear repression observed in the PRO-Seq signal suggests that this group of CRE is distinctly less transcriptionally permissive. NREs and enhancers both demonstrated RNA Polymerase II binding, though NREs exhibited weaker enrichment of S5-Phosphorylated PolII. These findings suggest that PolII may be recruited to enhancer elements bordering NREs but is less likely to be functionally active (Fig. 3B).
We next sought to characterize the effect of enhancers and NREs on expression of their target genes. First, we attempted to assign target genes to CREs on the basis of distance. Each CRE was assigned to the nearest TSS if within 10 kb (Fig. S1). We found that genes assigned enhancers, or both enhancers and NREs, had significantly higher TPM than genes associated with only open peaks from input.
Although assignment of genes to CREs by distance was used traditionally, multiple reports demonstrate that both enhancers and NREs loop from long distances to activate or repress their target promoters [19,38]. We assigned CRE gene targets leveraging Promoter Capture HiC (PC-HiC) performed in total CD4+ T cells [37]. Genes that are physically contacted by NREs are significantly repressed compared to those contacted by enhancers (Fig. 3C). Genes that are the targets of both enhancers and NREs or enhancers alone had significantly higher expression than genes that were targets of non-annotated ATAC-Seq peaks. We next assessed whether the regulation by CRE was dose-dependent on the number of CREs that target each promoter. We found that amongst genes only contacted by NRE, increasing the number of NRE to promoter contacts is not a significant negative predictor of gene expression A Twenty putative NREs are tested using CMV-Enhancer-LUC in DualGLO Luciferase assays (Promega). Relative luciferase activity of each CRE as tested in N = 3 independent human CD4+ donors is displayed. Relative Activity compares NRE activity to empty-vector CMV-Enhancer-LUC control activity. [(*) P-value ≤ 0.05 in one sided Mann-Whitney test] B Twenty putative enhancers are tested using pGL4.26 (minP promoter) in DualGLO Luciferase assays as described above. C Estimation of the true proportion of tested CREs which are functional in luciferase assays. The maximum likelihood is estimated from the binomial distribution given the number of significant luciferase CRE as tested (ß NREs : -23.66 ± 31.89, P-value = 0.44) ( Fig. 3D and E). The number of enhancers that contacted a gene was a positive predictor of expression (ß ENH : 50.03 ± 12.06, P-value = 3.75E-05). It is well known that enhancers are often cryptic and redundant [39,40], which supports our finding that genes with at least one enhancer association displayed significantly higher expression in Fig. 3C.

Lenti-STARR NREs and enhancers exhibit distinct patterns of chromatin modifications
We next sought to compare histone modification landscape across functional enhancers and NREs identified by STARR-Seq. Previous functional silencer screens report enrichment with chromatin modifications H3K27me3, H3K9ac, and H3K79me2, which are also relatively  [35]. Reads are centered at CRE peak center 'C' . B ChIP-Seq experiments of RNA Polymerase II Phospho-S5 and total RNA Polymerase II conducted in human CD4+ T cells [36]. C Chromatin looping targets of CRE to promoter with TPM from hCD4+ PolyA RNA-Seq [35,37]. [** Adjusted P-value ≤ 0.05 by non-parametric Kruskal-Wallis one-sided test with Holm adjustment for multiple comparisons]. D Number of CRE:Promoter PC-HiC contacts to each target gene, with TPM from hCD4+ PolyA RNA-Seq [35]. E Linear model explaining TPM by number of CRE:Promoter interactions to each promoter within each group of C [35] [** P-value ≤ 0.05] enriched at functional NREs compared to enhancers in this study (Fig. S2) [19,20,41,42]. Enhancers classically display enrichment with H3K27ac, H3K4me1, and sometimes H3K4me3, which we also observe at our STARR functional enhancers as well. Reports from other silencer screens of open chromatin find some enrichment for marks of heterochromatin [19,42]. Functional NREs identified in our assay were weakly enriched for H3K27me3 (Fig. 4A). We found that the traditionally active histone marks H3K27ac and H3K4me3 demonstrated divergent enrichment patterns at functional enhancers and NREs. H3K4me1, another mark of active enhancers, appeared similarly enriched at both NREs and enhancers. The distribution of H3K27ac and H3K4me3 histone ChIP-Seq tags suggested that NREs might be occupied by a nucleosome. To test this hypothesis, we imputed positions of nucleosomes from ATAC-Seq data using NucleoATAC [43]. Functional NREs were occupied by nucleosomes (or possibly nucleosome-size protein complexes), whereas enhancers seemed to be located between nucleosomes.
To test where functional enhancers and NREs reside within previously imputed chromatin states, we overlapped these elements with genomic segments identified by ChromHMM imputation based on 25 histone marks (Fig. 4B) [3]. In CD4+ T cells, STARR-Seq functional enhancers were more likely to be segmented as enhancer subtypes, or as active TSS. We also did not observe differences in CpG island enrichment within promoters between NREs and enhancers (Fig. S7B). NREs functional in STARR-Seq were more likely to be adjacent to promoters, particularly in downstream of TSS DNase accessible regions, which are likely intronic; enhancers are more likely found within annotated promoter regions (Fig. S7A). NREs displayed weaker enrichment in putative enhancer classes than did STARR-Seq functional enhancers. Though STARR-Seq enhancers were most likely to be segmented as enhancers in imputation models in hCD4+ T cells themselves, we found that these enhancers were also predicted as enhancer groups across hematopoietic cells, suggesting that some CD4+ enhancers are functional within a broader class of hematopoietic cells. Neither STARR-Seq enhancers nor NREs frequently segmented in regions of heterochromatin or ZNF repeats in mature CD4+ or hematopoietic lineages; mature CD4+ T cells exhibited weak heterochromatin chromatin signatures at sites of open chromatin [36]. Intriguingly, functional NREs also appear within accessible and classically activated chromatin environments. We also observe that the percentage of both NREs and enhancer elements that were devoid of histone modifications (quiescent) increased in non-hematopoietic cell types and stem cell lines, suggesting a progressive opening and activation of these CREs during hematopoietic development.
Both STARR-Seq identified enhancers and NREs were enriched within related cell types' enhancer predictions in FANTOM5 (Fig. 4C) [45]. Consistent with other silencer reports, we found that NREs could convert into enhancers in unrelated cell types on the basis of FANTOM5 enhancer annotations, DNase accessibility, and enhancer chromatin marks H3K4me1 and H3K27ac (Fig. 4D, E and F). Though we found that functional NREs are annotated as enhancers using their chromatin environment, additional experiments are needed to demonstrate that NREs are in fact cross-functional as enhancers in other cell types.

Transcription factor binding across Lenti-STARR cis-regulatory elements
In order to identify putative transcription factors that interact with the regulatory elements that we identified, we examined the overlap between these elements and a collection of ChIP-Seq experiments from various cell types available in GEO database using the RELI algorithm [44]. Functional enhancers were likely to be bound by numerous, classic enhancer activating factors (Fig. 5A). For example, CCAAT/enhancer binding protein zeta (CEBPZ) demonstrated stronger enrichment in enhancers versus NREs. We found that lymphoid enhancer binding factor 1 (LEF1) had strong enrichment at enhancers compared to NRE in epithelial carcinoma cells. LEF1 also has lymphocyte-specific functions: it promotes expression of some conventional and regulatory Th cell genes, while repressing CD8+ T cell-specific cytokine gene expression [46,47]. We observe that STARR-Seq functional enhancers are enriched with binding by some activation-inducible transcription factors: NFATC1, NFATC2, FOS (AP-1), and NFκB1 ( Fig. S3A and B). This suggests that at least some enhancers are capable  [35]. Nucleosome location in resting CD4+ T cells imputed from NucleoATAC [43]. Tag density of mean signal intensity displayed for Histone enrichment, and of imputed nucleosome position. Reads are centered at CRE peak center 'C' . B Proportion of functional CD4+ STARR-Seq Enhancers and NREs within ChromHMM Segmentation of 25 imputed groups [4]. Other cell groups displayed as percent of total. C RELI enrichment logP-value of available enhancer predictions in FANTOM5 database [44]. Coverage ratio is the proportion of target sites overlapping all sites. D RELI enrichment logP-value of STARR-Seq identified elements across available DNase-Seq experiments, E H4K1me1 experiments, and F H3K27ac experiments [44]  of participating in, or become subsequently turned-on during CD4 activation. Constitutively expressed factors, including ETS family member FLI1, also were enriched within enhancer motifs (Fig. 5C). FLI is known to bind enhancers in T cells [48], and other immune cell subtypes [megakaryocytes [49]], and can induce T cell leukemia if overexpressed [50]. The FLI portion of FLI-Ewing's Sarcoma fusion proteins was characterized as an activator of enhancers, and activated reporters in luciferase assays [51]. Another putative enhancer binder that we identified was CD74, a cell surface marker which was recently shown to also possess transcriptional activation activity [52,53]. We explored putative co-regulators of NREs by leveraging a catalogue of ChIP-Seq experiments, again performed in various cell types (Fig. 5B). Numerous histone modifiers are likely to play a role in NRE function; NREs demonstrated enrichment for lysine demethylases, including KDM2B and KDM4A, which are recruited to promote demethylation at H3K4, H3K9, and H3K36 residues. KDM4A demethylates both H3K36me3 [55] and H3K9me3, and recruits NCOR (nuclear core repressor) [56]. Other canonical repressors were also enriched in NREs, including members of polycomb repressor complex 1 (RNF2, RYBP) and polycomb repressor complex 2 (EZH2, SUZ12, EED).
NREs may also be recruiters of chromatin remodelers. INO80 member SRCAP was recruited to NREs in human CD4+ T cells (Fig. S3), and is a chromatin remodeler recruited by YY1 and other TFs. These findings support a model in which the accessibility changes and remodeling nucleosome position seen in Fig. 4A may be a result of chromatin remodeler recruitment to NREs. Next, we conducted motif analysis to find transcription factor motifs enriched at CREs (Fig. 5). Similar to other reports, we found that STARR-Seq identified putative insulators in the negatively regulated fraction of screened sequences [14]. We observed that the CTCF paralogue, CTCFL (Boris), motif was the most enriched motif at NREs, followed by the ETS family motif, which was highly enriched across all open chromatin sites in CD4+ T cells. (Fig. 5D). The function of CTCF is likely dual purpose; enhancers were also enriched for the CTCF paralogue, CTCFL (Boris), which is known to facilitate long-range enhancer and super-enhancer looping to target gene promoters [57,58]. Other enriched motifs within both enhancers and NREs belonged to families with mixed activating and repressing function, including the RUNX and ETS families (Fig. 5C) [59,60]. Enhancers were also enriched for multiple activating transcriptional activators, including NFY, which frequently binds cell-specific enhancers [61], and SP5. We also found motif enrichment for transcription factors related to CD4 activation, suggesting that some of these enhancers are functional during CD4 activation (Jun).

Discussion
We describe a screen identifying functional enhancers and NREs from a library of open chromatin in resting human CD4+ T cells. Using a STARR-Seq approach, we identified and validated the function of CREs in human primary CD4+ T cells. We found that Lenti-STARR-Seq can identify enhancers and NREs, each with nearly 80% sensitivity, as verified by luciferase assays. We described the chromatin landscape of both enhancers and NREs, in which enhancers were marked with H3K27ac and H3K4me1. They were also enriched for binding of enhancer activating proteins (CEBPZ). Intriguingly, we found that NREs tend to have central nucleosome placement, unlike enhancers which are located between nucleosomes. NREs and enhancers also function to regulate gene expression via long-range chromatin interactions. In short, we report thousands of functional enhancers and NREs from a screen of CRE function in primary human CD4+ T cells.
Enhancers have long been known to bind activating factors, including P300 and CEBP(Z), among others. We also found in other cell types that CD4 enhancers could bind LEF, TCF, and novel, recently identified enhancer activators like CD74. Integration of chromatin accessibility and chromatin modification data across cell types suggests that many CD4 functional enhancers are broadly active across hematopoietic lineages. Indeed, many hematopoietic cells have overlapping repertoires of transcription factors that may bind these elements. Although many enhancers are likely shared between hematopoietic lineage cells, these findings invite further investigation to refine enhancer specificity within lymphoid cell subtypes, particularly during developmental transitions (e.g. CD8+ CD4+ doublepositive to CD4+ CD8-single-positive cells), and during activation events in immune cells. We also find that CD4+ enhancers are capable of being bound by inducible transcription factors following activation (Fig. S3). Given that our experiments were performed in resting cells, it remains to be shown whether these enhancers are repurposed during activation by these inducible factors or whether lentiviral transduction may be weakly activating. We conducted this STARR-Seq screen using preparations of total CD4+ T cells. This cell isolation pool is comprised of numerous CD4+ T cell subtypes, many of which we expect exhibit unique CRE function. Though it is possible to glean subtype-specific CRE function based on accessibility and chromatin modifications, we cannot rule out that the STARR-Seq functional CRE identified herein are functional in only some CD4+ subtypes. This limitation is an ongoing technical challenge for screening methods, including STARR-Sequencing, as it requires high read-density coverage and tens of millions of cells per biologic replicate. With the advent of single-cell sequencing and single-cell CRISPR screens, these limitations may become less technically and fiscally burdensome.
Broadly, two mechanisms of silencer action have been proposed: 1) competing DNA binding between activating and repressing proteins and 2) recruitment of negative regulators, such as HDACs, with subsequent formation of heterochromatin domains [62,63]. Although we do observe local heterochromatin mark enrichment at functional NREs, use of an accessible chromatin library as input precludes analysis of functional NREs within expansive heterochromatin marked domains. Rather, we demonstrate that functional NREs can exist in open chromatin and even within 'activated' chromatin environments. These observations suggest some competition between transcription factors with opposing functions. The presence of PolII at NREs, but weaker enrichment of activated S5-PolII at NREs compared to enhancers, suggests that regulation of polymerase may be a silencing mechanism used by at least some NREs. Though we did not identify a central unifying repressor in CD4+ T cells, we found that these NREs were likely to recruit transcription factors and coactivators with known repressor and nucleosome-repositioning function. Other silencer screens corroborate this widely varied transcription factor and motif utilization across the functional silencer elements; no unifying chromatin marks or transcription factor binding profile of functional NREs has been identified to date [11,19,20,42,63].
DNA methylation is an important epigenetic regulator of transcription [64,65]. It is known that both activators and repressors are recruited to sites of DNA methylation including methyl CpG binding protein 2 (MECP2), which can function as both an activator and repressor of gene expression [66]. Intriguingly, both enhancers and silencers were reported to lie within CpG islands [42,67,68]. We also demonstrate that overlap with CpG islands is not different between NREs and enhancers (Fig. S7B). We cannot rule out that some elements identified in our screen mechanistically rely upon DNA methylation for either enhancer or NRE function, which may be an area for future study.
We unexpectedly observe that classically activating histone marks (H3K27ac, and H3K4me3) are enriched at NRE. Due to central enrichment of H3K4me3 at NRE, we assessed the proximity of NREs with promoters. NREs are more likely to be adjacent to promoters, particularly downstream of TSS within DNase accessible regions (intronic); enhancers are more likely found within annotated promoter-TSS regions (Fig. S7A). Enhancers, in fact, are demonstrably active promoters, which leads to detection of bi-directional RNA transcription originating from enhancer CRE by PRO-Seq. We demonstrate that PRO-Seq signal is enriched at the identified enhancers, and centrally depleted at functional NREs. NREs also function to actively repress target gene expression and are not merely inactive promoters (Fig. 3).
This screen demonstrates striking differences in patterns of epigenetic enrichment between enhancers and NREs. Other screens have identified functional silencer activity to be associated with H3K36me3 and H3K9ac, marks of active transcription and enhancers. We also find H3K9ac enrichment at NREs (Fig. S3C); analysis of functional silencers in K562 cells by Pang et al. 2020 also found that silencers were marked with H3K27ac and alternate histone H2A.Z. Intriguingly, some STARR-Seq NREs are positioned within active chromatin environments, including super-enhancers, although only STARR-Seq enhancers are significantly enriched within super-enhancers (Fig. S4). Previously, a detailed analysis of one super-enhancer found that some of its constituent elements function as silencers [69]. These multimodal ATAC-Seq sites may be multifunctional, providing both positive and negative regulatory information to nearby genes in different cellular contexts. Further study is required to refine the in situ function of these complex enhancer:NRE relationships at the level of individual genes.
We observe that a large number of functional NRE are occupied centrally by nucleosomes, as imputed by NucleoATAC, and confirmed by histone modification ChIP-Seq [43]. These nucleosome occupied fragments are isolated in the input ATAC-Seq library; this is not surprising given that Tn5 is likely to cleave target DNA in the areas between nucleosomes which produces nucleosome-containing fragments upon PCR. Furthermore, each ATAC-Seq peak represents composite accessibility: the isolated DNA is prepared from millions of cells which possess a variety of nucleosome positions. Central nucleosome placement can facilitate repressive TF recruitment to DNA motifs; indeed, many TF families demonstrate preferred binding to nucleosomes over free DNA [70].
Our approach queried the function of accessible CREs for enhancer and negative regulatory activity from a library of open chromatin; this assay excluded interrogation of CRE within heterochromatin and inaccessible chromatin environments. Screens for silencers from libraries of repressed chromatin, performed in Drosophila and other Hi-C experiments, suggest that at least some CREs within H3K27me3 domains are also functional as NREs [11,38,42]. Intriguingly, we also found some weak central enrichment of heterochromatin mark H3K27me3 at functional NREs in CD4+ T cells, despite a paucity of this mark at sites of open chromatin in mature CD4+ T cells. We also demonstrate that NREs and enhancers display key differences in local chromatin accessibility. STARR-Seq functional NREs possess a centrally positioned nucleosome, which may drive the strong central enrichment that we observed for the histone marks H3K27ac and H3K4me3 (Fig. S5). It is known that nucleosome repositioning by transcription factors can be repressive when placed in unfavorable locations for transcription [71]. A direct investigation of silencers that form heterochromatin found that silencing requires precise nucleosome positioning and can be sensitive to nucleosome repositioning [72]. Other investigations find that nucleosome positioning is important for silencing of heterochromatically marked genes more broadly [73,74]. As our assay tests putative CREs outside of their native chromatin environment, some NREs may contain intrinsic nucleosome-positioning information, or be permissive to nucleosome occupancy. These possibilities could support a potential silencing mechanism related to local chromatin accessibility changes that prevent transcriptional elongation. Alternatively, STARR-Seq may uncover some functional silencers that are endogenously occupied by nucleosomes. Intriguingly, we demonstrate that these NREs are functional from numerous genomic locations: intronically (STARR-Seq), upstream of a promoter (Luciferase), and in long-range chromatin looping in situ (Fig. 3E). Together this suggests that the silencing mechanism in hCD4+ cells is not limited to unfavorable nucleosome placement blocking intron transcription.
Others have speculated that NREs may have alternate enhancer functions in unrelated cell types. We also demonstrate that hCD4+ functional NREs resemble the enhancer phenotype in other cells due to their enrichment in FANTOM5, and enrichment of the enhancer histone marks H3K4me1 and H3K27ac. Though functional evidence is needed to support this hypothesis, we posit on the basis of our own findings that NRE function may also be incorrectly assumed as enhancing. Although it is tempting to ascribe enhancer activity to histone mark enrichment alone, we and others demonstrate that functional NREs exist within chromatin environments traditionally associated with enhancers. This challenge extends to many enhancer and silencer imputation attempts, which mainly leverage chromatin state. Indeed, we found that imputed silencer location in CD4+ T cells from SilencerDB had poor enrichment of Lenti-STARR NREs and enhancers (Fig. S6) [75]. We emphasize the need for further investigation into NRE behavior across developmental time periods and transitions, and specifically into whether NREs are repurposed as enhancers or simply maintain repressive function.

Conclusions
We implemented a STARR-Seq-based screen in human resting total CD4+ T cells using a non-integrating lentiviral transduction system. This screen annotates thousands of newly identified functional cis-regulatory elements in a primary human cell, and was validated using both traditional luciferase assays, and by assessing their regulation of target gene expression. We also explore the chromatin and transcription factor landscape of functional CRE, finding that NREs display characteristic central nucleosome positioning and can also function within histone environments marked by H3K27ac.
This study is an important step in characterizing the functional cis-regulatory genome in human primary cells, given that cell specific patterns of gene expression are a result of cell specific CRE function. Whereas previous assays for enhancer and silencer function are performed in cell lines or other model organisms, we demonstrate a STARR-Seq approach is useful even in difficult to transfect primary cells. These data are of great interest to study of lymphocyte function, where there is longstanding interest in identifying CRE responsible for control of gene expression.
The vector was digested with BspDI and KpnI and gel purified. A custom double-stranded (ds)DNA gBlock containing the origin of replication (ORI) and PolyA sequence was obtained from IDT (Supplemental Table 1), modeling the hORI-STARR screening vector generated by the Stark laboratory (Addgene plasmid #99296) [25]. Immediately following cloning, the original ORI was excised using a Site Directed Mutagenesis kit (NEB E0554S). The resulting pLenti-STARR vector was transformed (NEB C3040H) and isolated using Endotoxin Free Maxi-Prep (Zymo D4203).

Lenti-STARR Library Preparation
The STARR-Seq inset library was created according to the ATAC-STARR-Seq approach [15]. Briefly, OMNI ATAC-Seq was performed as previously described with 1,600,000 human CD4+ T cells (in total) in batches of 100,000 cells per reaction (NEB #M0544S) [26]. The ATAC-Seq reaction products were cleaned up using the Qiagen Reaction Cleanup Kit (Qiagen #28206) and eluted in 12 µL per 100,000 cells transposed. Each resultant elution was amplified in 50 µL with custom primers (Supplemental Table 1) for 10 total cycles, as described previously [15]. The amplified library was purified using a PCR Cleanup Kit (Qiagen #28104) and size selected for 150-500 bp fragments on a 2% agarose gel, using 300 mg per column (NEB #T1020L). The eluted ATAC-Seq library was cloned into AgeI-and SalI-digested pLenti-STARR using a 3:1 molar ratio (insert:backbone) in a total reaction volume of 100 µL using the NEBuilder HIFI DNA Assembly Kit (NEB #E5520S

Lenti-STARR library construction
Total RNA was purified from transduced cells (Qiagen #75144) and concentrated 7.5X using an RNA Cleanup Kit (NEB #T2040L). RNA was reverse transcribed according to manufacturer guidelines with STARR-Seq transcript-specific primers (Supplemental Table 1) with SuperScript IV (Invitrogen #18090010), using twice the recommended RNA input. cDNA was purified using a Reaction Cleanup Kit (Qiagen #28206) and amplified using custom STARR-Seq transcript-specific primers (Supplemental Table 1) as previously described [15]. Diluted Lenti-STARR-Seq plasmid 'input' control was amplified using an input concentration that yielded the same amplification cycle number as the STARR library for hCD4+ Donor I. PCR amplified library was purified using PCR cleanup columns (Qiagen #28206) and quantified for PE-150 bp sequencing (Novogene).

Alignment and next-generation sequencing processing
Samples were aligned to hg19 in Scientific Data Analysis Platform (SciDAP, https:// SciDAP. com) with the Trim-ChIP-PE processing pipeline (https:// github. com/ datir ium/ workf lows/ blob/ master/ workf lows/ trim-chips eqpe. cwl) [76]. Briefly, reads were trimmed and aligned to hg19, and bam files were produced. All mitochondrial reads were removed. Read count summary and alignment statistics are available in Supplemental Table 4.
The per duplication count of each unique read for each biologic donor was fitted to the negative binomial distribution for each biologic replicate, and for the input plasmid library. Duplicate reads exceeding the 75th percentile were removed from each sample by MACS2 to reduce false discovery of enhancers resulting from PCR duplication. Enhancers were called using MACS2 in which the plasmid reads were used as a control, [macs2 callpeak -f BEDPE -keep-dup all -g 2.7E9], and with -log10(P-value) greater than 75. NRE were also detected with MACS2, using the plasmid reads as treatment and the STARR RNA as control [macs2 callpeak -f BEDPE -keep-dup all -g 2.7E9], with -log10(P-value) greater than 30. Largely similar results can be obtained using FAST-NR [77]. Both enhancers and NRE are retained only if intersecting an ATAC-Seq narrowPeak called using MACS2 from the plasmid reads alone.
All final values and CRE locations are listed in Supplemental Table 2. Detailed data processing script can be found at https:// github. com/ Barski-lab/ Lenti-STARR
[ComputeMatrix reference-point -referencePoint center] where each row is centered by the CRE peak center. Figures labelled 'C' due to space constraints indicate the reference point is centered by the CRE peak center.

Motif enrichment
De Novo Motif enrichment performed using HOMER findMotifsGenome.pl was run with default parameters [54].

Nucleosome position imputation
Reads mapped to the input plasmid to ATAC-STARR were sequenced as described above. Bam files were processed with NucleoATAC to produce nucleosome occupancy scores and converted to a .bigWig for heatmap visualization [43].

Luciferase assays
Putative regulatory elements were randomly selected from all MACS2 identified peaks of STARR-Seq functional enhancers and NREs. The sequences for functional testing were selected on the basis of fragment distribution in the control ATAC library and sought to include the input ATAC-Seq peak center. gBlocks or eBlocks were synthesized (IDT) ranging from 150-400 bp in length and cloned into pGL4.26 (Promega #8441) for enhancer screening, or CMV-Enh-Luc (Addgene #45968) for NREs screening (All tested sequences included in Supplemental Table 1). Total human CD4+ T cells were purified as described above. Each transfection reaction consists of 20 µg of testing vector and 500 ng of Renilla-expressing phRL-TK (Promega #6251) mixed with 3.5E6 cells in buffer T using 100 µL Neon Tips (Invitrogen #MPK10025) [2200 V, 20-ms pulse width, 1X pulse]. Cells were cultured for 24 h, collected, and suspended in 60 µL DPBS for Nano-Glo Dual Luciferase Assay, according to the manufacturer's instructions (Promega #N1610). Absorbances were read using the Dual-Nano-Glo protocol with a GloMax plate reader (Promega #GM3000) in flat-bottom, opaque, white, 96-well plates (Corning #3912). Luciferase signal was averaged by donor across the technical transfection replicates. Control transfection (empty vector) was performed for each CD4+ T cell donor and plasmid condition (N = 3 hCD4+ Donors per CRE performed on separate days). [one sided Mann-Whitney test]. The maximum likelihood is estimated from the binomial distribution given the number of significant luciferase CREs tested. Plotted are the maximum likelihood estimates of y sucesses (y = number of significant CRE by luciferase testing) from n trials (n = 20 CRE tested) across a range of probabilities from 0 to 1 (p = [0,1]). Modeling was performed in R using the following script: [Define likeli.plot = function(y,n) where L = function(p) dbinom(y,n,p); mle = optimize(L, interval = c(0,1), maximum = TRUE)$max and p = (1:100)/100; plot(p, L(p))].

RNA expression values [transcripts per million (TPM)]
were downloaded from PolyA RNA-Seq, performed in total CD4+ T cells [35]. The Nearest TSS of STARR-Seq CRE were annotated by Homer and retained if within 10 kb of the CRE [54]. Genes were assigned to the following bins: targets of enhancers alone, NREs alone, both enhancers and NREs, or ATAC-Seq peak association alone. CRE and target promoter assignments were annotated using Promoter-Capture Hi-C performed in total CD4+ T cells [37]. CRE were intersected (bedtools intersect), and only high-confidence chromatin interactions were retained (CHiCAGO score greater than 5) [79]. Statistical differences in target gene expression between CRE groups were assessed with a non-parametric, Kruskal-Wallis one-sided test with Holm adjustment for multiple comparisons. A linear model explaining TPM by the number of CRE:Promoter contacts was used to test for significance between gene expression and the number of enhancers, NREs, or only ATAC-Seq peak contacts for genes only contacted by the indicated CRE type. The Beta slope coefficients of each independent variable in the linear model Y TPM = ß 0 + ß*X NREs , and Y TPM = ß 0 + ß*X Enhancers (where X is the number of elements contacting target gene promoter) are presented along with standard error. Modeling was performed in R using the following script: [glm(TPM ~ NRE) and glm(TPM ~ ENH)].

RELI analysis
We used the RELI method as a tool to estimate the significance of intersection between the identified CRE and published ChIP-Seq binding experiments. Briefly, the RELI algorithm, described in detail previously, was used to enrich 10,981 ChIP-Seq experiments of transcription factors and 2634 non-transcription factor datasets (histone modification ChIP-Seq, FANTOM5 enhancer annotations, and DNaseI experiments) [44]. RELI was run for 1000 iterations, using an open chromatin null model in hg19, without peak extension. [reli-batch-sub -rep 1000 -species hg19 -null OpenChrom]. This yields an adjusted P-value representing the significance of intersection between the identified CRE and each experiment [Bonferroni correction].

Statistical analysis
All statistical tests were performed in R and detailed in the appropriate methods sections and figure legends. (R version 4.2.2).